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Abstract 


In the linear model y = X $+ e with the errors distributed as normal, 
we obtain generalized least square (GLS) , restricted GLS (RGLS), prelim- 
inary test (PT), Stein-type shrinkage (S) and positive-rule shrinkage (PRS) 
estimators for regression vector parameter @ when the covariance structure 
in known. We compare the quadratic risks of the underlying estimators and 
propose the dominance orders of the five estimators. 
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1 Introduction 


The most important model belonging to the class of general linear hypotheses 
is the multiple regression model. The general purpose of multiple regression is 
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to learn more about the relationship between several independent or predictor 
variables and a dependent or criterion variable. 


To deal with a common multiple regression equation, consider the linear model 
y=xXB+e, (1.1) 


where y is an n-vector of response, X is an n x p design matrix with full rank p, 
B = (B1,-:+ , Bp)’ is a p-vector of regression coefficients and e = (e€1,:-- ,en)' is 
the n-vector of errors distributed as multivariate normal with location parameter 
zero and positive definite (p.d.) covariance matrix , denoted by e ~ N,,(0,%). 


Then directly 
y ~ Np(XB, &). (1.2) 


Let us assume that in addition to the sample information y in the model (1.1), 
that information also exists in the form of g independent linear hypothesis about 
the unknown vector parameter (6 where q < p. These general restrictions can be 


shown as 
HB=h, (1.3) 


where H is a gq x p known hypothesis design matrix of rank gq and h is a q x 1 
vector of prespecified, hypothetical values. 

The estimation of parameters of the multiple regression model is a common in- 
terest to many users. Often the properties of the estimators are of prime concern. 
Selection of any specific statistical property of any estimator often depends on 
the objective of the study. The choice of any particular estimator may very well 
be determined by the aim of the end users. It is well known that the ordinary 
least squares estimators are best linear unbiased. However, if the objective of any 
study is to minimize some specific risk function then other types of estimators 
perform better than the ordinary least squares estimator. Our primary object of 
this paper is to estimate $ when the p.d. covariance matrix 4 is known under the 


subspace restriction (1.3); and then obtain shrinkage estimators of 6 using the 
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likelihood ratio test (LRT) statistic of (1.3). For complete review of underlying 
study in the special case © = oI, for both known and unknown o? and may o? 
have inverse gamma distribution, see Saleh and Han [9], Tabatabaey [13], Khan 


[5, 6], Srivastava and Saleh [12] and Saleh [10]. 


2 Estimation 


Given classical conditions (see Kuan [7]), it is well known that for known p.d. 


covariance matrix ¥, the generalized least square (GLS) estimator of § is 
B= {XxX ety. (2.1) 


Obtaining GLS estimator of @ under the constraint Hy) : HG = h, using method 
of Lagrangian multipliers, the restricted GLS estimator of 8 subject to the linear 


restriction Hy : HB =h as B is given by 
B = B—-(X't-1x)-lwAA(X'S lx) A (AB Kh). (2.2) 


See Ravishanker and Dey [8]. 
Let Gy = (X’H"!X)7! and Gp = [HG,H']“', then simplifying (2.2) we obtain 


B = B-G,H'G2(HB —h). (2.3) 


Now we consider the linear hypothesis HG = h in (1.3) and obtain the test 
statistic for the null hypothesis Hp : HB =h. 

Now letw= {8:6 ER?, HE=h, U> Oh} andN={8: BE R?, XU > O}, then 
the likelihood test statistic for underlying hypothesis is 

maxge,, L(G, X) 

maxgeq L(f, X) 

exp{=} [(y — XB)'="(y — XA)]} 

exp{= [(y — XB) =-l(y — XB)]} 

= exp{S [(HB ~ hy! G28 — hy}. 


a 
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which is a decreasing function with respect to y = (HB — h)'Go(HB — h). 

Let u = Gy!?(HB —h); then using (1.2), x = u’u has non-central chi-square 
distribution with q degrees of freedom and noncentrality parameter ju’. /2, where 
p= G3/?(HB -h). 

Bancroft [2] defined the preliminary test estimator (PTE) of 6 as a convex com- 
bination of B and B by 


~ KX n~ 


BPr = B+[1-IUx <x7(a))](6 - 8), (2.4) 


where I(A) is the indicator of the set A and x?(q) is the upper 100a percentile 
of the central x? distribution with q degrees of freedom. 

The PTE has the disadvantage that it depends on a (0 < a < 1), the level of 
significance and also it yields the extreme results, namely B and B depending on 
the outcome of the test. Therefore we define an intermediate value as Stein-type 


shrinkage estimator (SE) of 8, by 


Be = B+(1-px7')(B- 8), (2.5) 
where 
_ (g-2)(n-p) , 
= Gare peo) d q>3. (2.6) 


The SE has the disadvantage that it has strange behavior for small values of y. 
Also, the shrinkage factor (1 — py!) becomes negative for y < p. Hence we 
define a better estimator by positive-rule shrinkage estimator (PRSE) of 6 as 


A 


BS+ = B+(1—px)I[x > (6-8) 
= B—(1—-px7)Ilx < p](8 — 8). (2.7) 


Note that this estimator is a convex combination of B and B : 


The quadratic risk functions of the estimators are given in the following sec- 


tion and the dominance properties are studied in section 4. 
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3 Risk Evaluations 


For a given non-singular matrix W, consider the weighted quadratic error loss 


function of the form 
L(6*;8) = (8* — B)'W(6* — B), (3.1) 


where (£* is any estimator of 6. Then the weighted quadratic risk function asso- 


ciated with (3.1) is defined as 
R(B*;8) = El(6* — By W(6* — B)). (3.2) 


In this section, using the risk function (3.2), we evaluate the quadratic risks of 
the five different estimators under study. 


Direct computations using (1.2), (2.1) and (3.2) lead to 
R(6;8) = tr(GiW). (3.3) 
Let 6 = G, H’G2(Hf —h), then using (2.4) we have 


R(6;8) = tr{W[Gi(Ip — H'G2HG,) + 66'}} 
tr(GiW) — tr{W[Gi(H'G2HG)]} + ow. (3.4) 


Note that R = Gy ? HT! GoH Gy "isa symmetric idempotent matrix of rank q < p. 
Thus, there exists an orthogonal matrix Q (Q’Q = Ip) (see Judge and Bock [4]) 


such that 
I, O 
QRQY = ; = le (3.5) 
ool? wala! Au At 
1 1 
Ag, Ao 
a (3.6) 


The matrices A,; and Ag9 are of orders q and p — q, respectively. 


Define the random variable 


w = QG) 8 -QGi?H'Goh, (3.7) 
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then 

wi o~ Nq(n; Ip). (3.8) 
Also 

n = QG,?6- QGy?H'Goh. (3.9) 


Partitioning the vectors w = (w{,w)’ and » = (n},75)’, where w; and wz are 


independent sub-vectors of orders q and p — q respectively, we obtain 

B-6 = G°Q'w-n). (3.10) 
Using (3.7) we can obtain 

X= ww, 9=mm = (HB —h)'Go(HB — h). (3.11) 
Now, we may write 


tr{W [G1 H'G2HG)} tr{QG}?WGE}?Q'QRQ} 


{ Ai, Ajo Ig 0 } 
= tr 
Ag, Ao 0 0 
= tr(Aj1). (3.12) 


Using (3.11) we have 


WS = (HB —h)'G2HG\WG,H'G2(HB —h) 
= Aum. (3.13) 
Therefore, we obtain 
R(6;8) = tr(GiW)—tr(An) +7 Aum. (3.14) 
Using (2.5) 
R(BPT;B) = Bl(6P" — B)'w(BP" — )| 
= Ef{{((b-B)-Ix < x2)(6 - BI W((6 - B) 
—I(x < x2)(6 - BY] 
= El(6 - 6)'(6 - 8)] - 2E[( < xa)(6 - 6 WI6 - B)| 
+E[I(x < x2)(6 — B)'W (6 - B)]. (3.15) 
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Using (3.7)-(3.11) and (3.15) 
R(BPT;B) = tr(GiW) — lw) Auwil(x < x3)] 
—2E [wh Aaiwil (x < x4) + 2m Au Elwil(x < x3) 
+2nh Ao Elwil(x < x4)]; (3.16) 
because w , and w2 are independent 
Ew, Anwil(y <x3)] = mAanElwil(x < x4); (3.17) 
using Lemmal in Appendix with ¢(y) as indicator function of y, we get 
R(BPT;B) = tr(GiW) - X542,0(0)tr(Ar1) 
+[2x7.19.9(a) — x344,9(a)]n Aim. (3.18) 
Using (2.6) and (2.7) 


R(6°;8) = El(6* — B)'w(6* - B)] 
= E{{(8 - 8) —px"(6 - BI W[6 - 8) -— ex (6 - 
= El(6 - B)'W(6 - B)| - 2pE[x "(6 - B)'W(6 - 8) 
+p Ely (8 — B)'W(6 — B)], (3.19) 
using (3.7)-(3.11) and (3.15) 


R(B°;B) = tr(GiW) — 2pE[x 7! (wi Anwi — 7 Ariw1 + whAoiwi 


—nyAniwi)] + p°E[x?(w} Aiwi)}- (3.20) 
Using Lemmal in Appendix for ¢() = y~!, we have 
-lv ! 1 
El_x mi Auwi) = mAumE | | ; (3.21) 
Xq+2,0 
Ely tw} Anwi] =i ak 5) tr(Ai1) +E a nm Aim: (3.22) 
Xq+2,0 Xq+4,0 


Using Lemmal in Appendix for ¢() = y~?, we have 
2 
tr(Ai1) +# 


2 
n Aum: (3.23) 


Ely*w} Aw) = EB 5} 


2 
Xq+2,0 


Xq+4,0 
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Using (3.21)-(3.23) one can obtain 


R(BS;) = tr(GiW) ~ p{2E 


+p{ 2B 


pale pE 


1 
5 = 
Xq+2,0 
1 
pe alee 
Xq+4,0 


2 
1 
5) ian. (3.24) 
Xq+4,0 


2 
Xq+2,0 


Finally the risk of PRSE is given by 


R(B5+;8) = B[(6S* — B)'w(65* — p)] 

= E{[(6* — 8) — (1-px7')U(x < p)(6 - BI WI(B* - B) 
—(1— px)I(x < p)(B - BY} 

= R(6°;B) + El — px) 1(x < p)(6 - 8) W(B - A) 
—2E[(6% — BY W(1 - px7)I(x < p)(6 - A]. (3.25) 


But using (2.6) 
E((B5 — BY Wl — px7!)I(x < p(B -B) 
= E[(6-8)+(0—-px)(6 - BW - px I(x < p)(6 - BD 


= E{(6- 6 Wl —px Ix < p)(6 - B)}} 
+E[(1 — px!) I(x < p)(6 — B)'W(6 - B)). (3.26) 


Thus, we obtain 


R(BS*;8) = R(BS;B) — Bl — px)? T(x < p)(6 — BW 8 -B)) 
~2E{(B — BY WI — px) Lx < e)(B-A}. (3-27) 
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Using (3.21)-(3.23), and Lemmal in Appendix for ¢(x) = (1 — px7!)'T(x < p), 
(i = 1,2) we get 


2 
R(B°*;8) = R(B°;8)-E (:- - T(xq42,0 <p) | tr(Aus) 


Xq+2,0 


2 

p 

+E |(1- T(xo440 <p)| m Aum 
Xq+4,0 


p 
( 3 = 7 T(xe496 <p) 


Xq+2,0 


—2E ny Aum (3.28) 


4 Comparison 
Providing risk analysis of the underlying estimators with the weight matrix W, 
we have (see e.g. Searle [11]) 

6chi(A11) < m Arm < Ochq(Ai1), (4.1) 


where ch; (A11) and chg(Aj1) are the minimum and maximum eigenvalues of Aj; 


respectively. Then by (3.3) and (3.14) one may easily see that 
R(6; B) — tr(Au) + Ochi (Ais) < R(B;B) < R(B;B) — tr(Ai1) + Ochg(Au). 
By (3.11) and (3.30), under the null hypothesis Ho : HB = h, we conclude 
R(B;B) < R(B;6). 


In general by (3.30), B performs better then 8 whenever 
tr(Aj1) 
chg(A11) 


YL, chi (An) 
chg(A11) 


6 


< ¢ 
Using (3.3) and (3.18) we have 


R(BPT; B) — R(BsB) = [2xa429(@) — xo440(@)In Aum 
—xj+2,9(a)tr(Ai). (4.2) 
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Therefore BP T performs better than B whenever 


tr(A i a 
0 < SN) ns a (4.3) 
chg(Ans) * Px2yp9(0) —X2 04,90 
Because tr(Ai1) = q, (3.33) satisfies for W = X’D71X, 
Also under the null hypothesis Ho, since (3.32) is negative for all a, BPT performs 
better than B. 
Using (3.14), (3.18) and the risks difference we can conclude that BPT performs 


better than B whenever 


[1 — x79 9(a)Jér(Ais) 


P= 2xF 00) + xp 9/a0 Neha) (4.4) 


Thus, the dominance order of the three estimator B ; B and BP T under the null 


hypothesis Ho is given by 
paneer or 
where the notation + means dominate. 


Under the null hypothesis, 


A 


R(B°;8) — R(A:8) = —ptr(An) 24 DP 


q(q — 2) 
By the direct computations using the fact n > p, we get p < 2(q—2). Therefore, 
the risk difference R(8°; 8) — R(G;) is negative and 8° > 6 uniformly. 


Under the null hypothesis Ho, we have 


R(6°;B) = R(B;B)+tr(Au)f(n, 4,0), 


where f(n,q,p) = pela 2) rata) 


The function f(n,q,p) is positive for q > 3. Thus R(B%; B) > R(8; 8). However, 
as 7 moves away from 0, 7 A1i7 increases and the risk of B becomes unbounded 
while the risk of Bs remains below the risk of B ; thus Bs dominates B outside an 


interval around the origin. 
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Comparing Bs and BPT, under Ho, we get 


RG) = ROGET) + |x3,20(a) — 29BLq—] + p* BLP] ern) 
Xq+2,0 Xq+2,0 
. 2 2 
= ROP") + \xipzo(a) 2 + lira) 


R(BP’), 


IV 


for all a such that J = x242,0(@) = “6 + oy > 0 and R(8%; 8) < R(BP) for all 
a such that | < 0. 


Because w 1 is independent of w2, we get 


R(B°*; 8) — R(B°;B) = —E[(— px") I(x < pw Anwi] 
—2E[(1 — px7")I(x < p)(w Auwi — 7 Aw). 
(4.5) 


Note that for such 6 under which Ne 42,9 <p we have 


PJ I(x242,9) S p)] <0. 


E{(1- 
X542,0 


Moreover, the expectation of a positive random variable, is positive, then one can 
obtain the risk difference in (4.5) is negative. Therefore, for all £, BS* > BS and 
under Ho, B > por 
However, as 7, moves away from 0, 7, A117 increases and the risk of B becomes 
unbounded while the risk of Bs + remains below the risk of B ; thus Bs + dominates 
B outside an interval around the origin. 

Under the conditions are given above, it can be found that the dominance 


order of five estimators of 6 can be categorized in the following two orders: 


1. Beet Spr po 8 (4.6) 
and 


2, pep Sp? ep ee (4.7) 
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5 Illustrative Example 


For an illustrative example of domination orders of five estimators under study, 


we proceed with numerical and graphical examples. 


Numerical Example Now for an illustrative example of domination orders 
given in the previous section, we accomplish with a numerical example from Searle 


[11]. ae we have the following five sets of observations (including xj9 = 1 


fori =1,- 


[2a Po 


2 


62 12 6 e1 
60 19 10] [ B e9 
57 | =) 1-6 a Bo | +| es |, 
48 13 13 Bs e4 
23 ia 2 es 


where the covariance structure of the error term has the form © = o7R for 
R= (1—p)l5+pJs, when p = 5 and a” = 2, which satisfies the condition under 
which 57 oe ae 


Moreover, assume that we want to test the null hypothesis 


Bo = 0.5 
Ho: 4 26, — 62 +363 =2 - 
i= +1 
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In this approach we have 


0 1 0 0.5 
H=/2 -1 3], h= 2 
1 0 O —1 


Direct algebraic computations lead to 


2.64583 —0.16667 —0.0875 83.7037 23.1481 —40.1852 
G, = | —0.16667 0.03333 0.0000 , Gog= 23.1481 13.4259  —24.9074 
—0.08750 0.00000 0.0125 —40.1852 —24.9074 46.7593 


Using (2.1) and (2.2) we obtain 


37.0 0.5 
b=|05 |,@=| 2 | andy = 1203.3. 
15 <i 


Consider in this example n = 5, p = 3 and q = 3. Therefore using (2.6) we 


get p = %. Then using (2.4), (2.5) and (2.7) we have 


0.5 ae 
BPP =| 9 | +[1—1(1203 < x2(a))] | 0 |, 
= 0 
0.5 38 
pst ~ Bs = 2 + 0.9998] 0 |, 
a 0 


In order to compare the risks of the above five estimators, suppose the weight 


matrix is given by 


0 -1 -1 
W=]/1 0 1 
1 1 1 


Then from (3.12) we get tr(Ai1,) = 0.0125. 
Using (3.3) and (3.14) we have R(8;8) = 0.0125 and R(8;8) = 0. Clearly 8 
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performs better than 8 whenever 6 < 0.0125. Using Lemma2 from Appendix we 
can determine the risk functions for different values a and 0. We will continue 
with large values of 6, to do better comparisons, which result in large unreasonable 
risks’ values. The results are given in Tablel. 


Table1: Risks’ eT 


pe eae 


0.05 0.0044 
0.001 : : 0.0045 

0.1 ; : 0.0187 

. 0.3041 

: 37.1286 


From the Table1, it can be easily seen that 


1. Under Hp (6 = 0), the domination order given in (4.6) satisfies. 

2. For 6 < 0.001, the risks of PRSE and SE have decreasing trends and for 
6 > 0.1 those change to increasing. 

3. For 6 > 0.1, GLSE performs better than both RGLSE and PTE, and PTE 
performs better than RGLSE. 


Graphical Example Some graphical perspectives of the risks of estimators 
B, B, BPT BS and 65+ can be shown using approximations of (3.24) and (3.28). 
In this approach, we use lemma 2 in Appendix to compute (3.34) and (3.35) 
Then substituting suitable expression in (3.24) and (3.285), we compute underly- 
ing risks approximately using packages MATLAB release 7.2 and MAPLE release 
9.5. 
For special case n = 20, p = 5 and q = 3, when W = X'D~!X, the graphical 
displays are as follow (Because changing values a in (3.18), does not clear graph- 
ically we use just a = 0.3. Note that when a increases R(3?7; 3) decreases). 
In Figure 1, the horizontal axis are the values of @ and 

= R(6;6), R2 = R(6;8), R3 = R(BPT;A), R4 = R(A°;6), RS = 
(Ge: 6), 
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6 a 

] 5.24 

5 4 

44 

34 

24 J 

) 1 2 3 4 ) 0.05 0.1 0.15 02 0.25 
R1 = RI 
R2 —— R3 

—<—<—<—<— R3 —<—s R4 


4.9 


48 


47 


4.6 


4.555 ttt 
0 0.1 0.2 0.3 0.4 0.5 R2 
R4 —= R3 
a R5 es R5 


Figure 1: Risks Comparison 
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6 Appendix 


Lemma 6.1 Assume the random variable w is normally distributed with mean 


vector T and covariance matrix I; and A is any p.d. symmetric matric. 


assume $(.) is a Borel measurable function, then 


El¢(w'w)w| = ElP(X§ 4207/2) 7: 
Blb(w'w)w! Av] = E[P 03 y2q12/2)ltr(A) + B1OO3 pap /2)]7" Ar. 


Proof. For the proof see Appendix B.2. in Judge and Bock [4]. 


Lemma 6.2 Let p be an integer greater that 2m (p > 2m) then 


El S-)I02 972 SP = x2 9/0(0) +, 


= Meat eC geal) 
ri(q + 2r —2)(q + 2r —4) , 


r=0 


Also 


Proof. Using the series expansion for inverse non-central chi-square distribution 


(see Johnson and Kotz [3]), we have 


20 6 4/2/ 
my") = SPF yy 
=0 Xq+2r,0 
= imeny P(qg/2-4r—m) 
2™r! T(q/2+71) 


0 vis 


— 


Thus we can obtain 


20, 60/2 
Bods <0) = SPY my ype < 


q,0 Xq+2r,0 


3 


=0 
eer P(q/2+r —m) 


= deme Xe & Xator.0(P) 
0 


Qmr!} T(q/2+1r) 


i 
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xio/2(0) ~ 2PEU a xgo < 0) 


q,0 


1 
+P El(-)T(xG0 Sp) 

q,0 

oe aa cat er O/2)" ; 
X4,0/2(0 )+ 3 G2 +7) riT( (q/2 + r) X Xq-+2r,0() 
2 
—2 

x [eta ra — p0'(q/2 +r —1) 

es > ple — 2q — 4r + 8))e~*/? (0/2) x742,,0(0) 
Xqa/2(P = ri(q+2r —2)(q + 2r —4) ; 
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